Industries and Targets





Kerry Back

Transforming the target variable

  • We should usually deal with outliers for the target variable.
  • We’ve used ranks, but there are more systematic ways.
  • For example, we could use QuantileTransformer.
  • When we predict with the model, we get “untransformed” predictions (predicting original target)

Usual example

  • roeq and mom12m in 2021-12
  • RandomForestRegressor
  • transform1, poly, transform2
  • Use GridSearchCV to find max_depth

Changes

  • Instead of model=RandomForestRegressor, use model = TransformedTargetRegressor(…)
    • input1: regressor
    • input2: transformer
  • param_grid = {“transformedtargetregressor__regressor__max_depth”: [4, 6, 8]}
  • pipe = … (as before)
  • cv = … (as before)

from sklearn.compose import TransformedTargetRegressor

transform3 = QuantileTransformer(
    output_distribution="normal"
)

model = TransformedTargetRegressor(
    regressor=RandomForestRegressor(random_state=0),
    transformer=transform3
)

pipe = make_pipeline(
    transform1,
    poly,
    transform2,
    model
)

param_grid = {
    "transformedtargetregressor__regressor__max_depth": [4, 6, 8]
}

cv = GridSearchCV(
    pipe, 
    param_grid=param_grid
)

X = data[["roeq", "mom12m"]]
y = data["ret"]

cv.fit(X, y)

Exercise

Run GridSearchCV with

  • MLP Regressor,
  • hidden layer sizes = (16, 8, 4, 2) and (8, 4, 2) ,
  • and a transformed target variable.

Industries

  • Used OneHotEncoder to create dummy variables
  • Let’s also use deviations from industry means as predictors
  • E.g., maybe we want to buy stocks with high ROEs relative to their industry rather than just high ROE stocks in general

Calculating deviations from means

data["roeqx"] = data.groupby("industry").roeq.transform(
    lambda x: x - x.mean()
)
data["mom12mx"] = data.groupby("industry").mom12m.transform(
    lambda x: x - x.mean()
)

X = data[["roeq", "mom12m", "roeqx", "mom12mx", "industry"]]

Then use OneHotEncoder and make_column_transformer as before.

Or compute deviations in a loop

chars = ["roeq", "mom12m"]

for char in chars:
    data[char+"x"] = data.groupby("industry")[char].transform(
        lambda x: x - x.mean()
    )

newchars = chars + [char+"x" for char in chars]
X = data[newchars+["industry"]]

Combining Column and Target Transformers

  • Let’s do GridSearchCV with dummy variables and a transformed target variable
  • Column transformer for features
  • Transformer in model
  • Create pipeline
  • Will use slightly different names than in 7a notebook, so regard 7a as obsolete